june 23, 2020

opening

house-keeping

  • unmute your mic to say hello
  • mute mic, cams off
  • 4hrs with breaks
  • there will be exercises
  • write questions in the chat
  • R and RStudio needs to be installed: install R and RStudio
  • we use tidyverse (which loads ggplot2 as well)
  • basics assumed but do ask

workshop folder

  • go to github: https://github.com/jensroes/visualisation-workshop
  • download repository: visualisation-workshop
  • unpack folder (if downloaded as zip)
  • contents:
    • data
    • exercises
    • scripts
    • slides
    • visualisation-workshop.Rproj
  • double-click on visualisation-workshop.Rproj

outline

  • principles of data visualisation
  • grammar of graphics
  • aesthetics and attributes
  • geometries
  • major tools of data visualisation
  • cosmetics
  • closing remarks
  • references

what is data visualisation?

  • statistics: graphical data analysis
  • design: communication and perception
  • exploratory plots: confirm and analyse data (small specialist audience)
  • explanatory plots: inform and persuade (wide audience)
  • advice: think about your audience

exploring data

horses <- read_csv("../data/horses.csv")
glimpse(horses)
Rows: 50
Columns: 6
$ X1      <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
$ HorseID <dbl> 97, 156, 56, 139, 65, 184, 88, 182, 101, 135, 35, 39, 198, 10…
$ Price   <dbl> 38000, 40000, 10000, 12000, 25000, 35000, 35000, 12000, 22000…
$ Age     <dbl> 3, 5, 1, 8, 4, 8, 5, 17, 4, 6, 7, 7, 14, 6, 3, 6, 6, 12, 7, 7…
$ Height  <dbl> 16.75, 17.00, NA, 16.00, 16.25, 16.25, 16.50, 16.75, 17.25, 1…
$ Sex     <chr> "m", "m", "m", "f", "m", "f", "m", "f", "m", "f", "m", "f", "…

scatter plot: relationship between age and price

ggplot(data = horses, aes(x = Age, y =  Price)) 

scatter plot: relationship between age and price

ggplot(data = horses, aes(x = Age, y =  Price)) +
  geom_point()  

scatter plot: linear model

ggplot(data = horses, aes(x = Age, y = Price)) +
  geom_point() +
  geom_smooth(method = "lm", se= F) 

scatter plot: quadratic model

ggplot(data = horses, aes(x = Age, y = Price)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = F ) 

scatter plot: grouping variable

ggplot(data = horses, aes(x = Age, y = Price, colour = Sex)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = F ) 

scatter plot: explanatory plot

ggplot(data = horses, aes(x = Age, y = Price, colour = Sex)) +
  geom_point() +
  geom_smooth(method = "lm", se = F, formula = y ~ x + I(x^2), fullrange = T ) +
  ggthemes::theme_clean() +
  scale_y_continuous(labels = scales::dollar_format(prefix = "$")) +
  ggthemes::scale_color_colorblind(labels = c("Female", "Male") ) +
  labs(y = "Price (in US Dollar)", 
       x = "Age (in years)") +
  theme(legend.position = "bottom",
        legend.justification = "right",
        axis.title = element_text(hjust = 0))

scatter plot: explanatory plot

why data visualisation?

why data visualisation?

“[data visualization] forces us to notice what we never expected to see.” (Tukey 1977)

  • exploring structures in the data (e.g. relationship between two variables)
  • understanding patterns (beyond descriptives)
  • selecting appropriate stats
  • communication of findings
  • persuation of audience

Anscombe’s quartet

Anscombe (1973) and Tufte (1989)

x
y
Data set Mean SD Mean SD
1 9 3.32 7.5 2.03
2 9 3.32 7.5 2.03
3 9 3.32 7.5 2.03
4 9 3.32 7.5 2.03

Anscombe’s quartet

Anscombe’s quartet

the datasaurus dozen

Matejka and Fitzmaurice (2017): see link

principles of data visualisation

basic principles

  • no “one fits all” method
  • some methods are more informative than others
  • maximise what we can learn from data

basic principles

  • going beyond summary statistics
  • descriptive summary statistics may conceale / obscure important patterns
  • appropriate stats
  • prevent wrong conclusions about data / theory
  • visualisation helps us to understand patterns, structures, relationships
  • see e.g. Anscombe’s Quartet

basic principles

Hartwig and Dearing (1979)

  • skepticism: any visualization might obscure or misrepresent data
  • openness: there might be patterns and structures that we were not expecting

basic principles

Tufte (1983)

  • above all else show the data
  • avoid distorting what the data have to say
  • present many numbers in a small space
  • encourage the eye to compare different pieces of data
  • reveal data at several levels of detail, from broad overview to fine structures

exercise 1

  • data set: mammals (Allison and Cicchetti 1976; Weisberg 1985)
  • average brain (in g) and body weights (in kg) for 62 species of land mammals.
mammals <- read_csv("../data/mammals.csv") %>%
  rename(species = X1)

glimpse(mammals)
Rows: 62
Columns: 3
$ species <chr> "Arctic fox", "Owl monkey", "Mountain beaver", "Cow", "Grey w…
$ body    <dbl> 3.385, 0.480, 1.350, 465.000, 36.330, 27.660, 14.830, 1.040, …
$ brain   <dbl> 44.50, 15.50, 8.10, 423.00, 119.50, 115.00, 98.20, 5.50, 58.0…

exercise 1

creating (gg)plots in R

  • open script exercises/Exercise 1.R
  • read and follow the instructions in the comments.
  • you will need to replace the "_ _ _"s accordingly.
  • run your code (not the entire script): CTRL+Enter

grammar of graphics

grammar of graphics

Wilkinson (1999)

  • graphics are build on an underlying grammar
  • system of rules for mapping variables to properties to visualise data
  • i.e. ingredients (1) and the recipe (2)
  • principle 1: graphics consist of distinct layers of grammatical elements (data, aesthetics, geometries)
  • principle 2: … are build around aesthetic mappings

grammar of graphics

  • ggplot2 builds on these principles (Wickham 2016, 2010)
  • higher-level plotting system compared to base R functions (e.g. plot(), hist())
  • complex visualisations can be creased with a minimal amount of code
  • integration of statistical information

grammatical elements

  • data: name of the data variable
  • aesthetics: mapping between data and graphic properties (axes, size, colour) indicated as aes()
  • geometries: visual elements encoding the data indicated as geom_…()

ggplot(data = horses, mapping = aes(y = Price, x = Age, colour = Sex))

ggplot(data = horses, mapping = aes(y = Price, x = Age, colour = Sex)) +
  geom_point()

ggplot(data = horses, mapping = aes(y = Price, x = Age, colour = Sex)) +
  geom_smooth(method = "lm") 

ggplot(data = horses, mapping = aes(y = Price, x = Age, colour = Sex)) +
  geom_point() +
  geom_smooth(method = "lm") 

optional grammatical elements

  • facets: deviding data into subplots
  • statistics: summarising representations
  • coordinates: plotting space
  • theme: visual properties not related to the data (font, background)

ggplot2: layers of grammatical elements

  • data
  • aesthetics
ggplot(data = horses, aes(y = Price, x = Age))

ggplot2: layers of grammatical elements

  • data
  • aesthetics
  • geometries
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point(size = .25)

ggplot2: layers of grammatical elements

  • data
  • aesthetics
  • geometries
  • facets
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point(size = .25) +
  facet_grid( ~ Sex)

ggplot2: layers of grammatical elements

  • data
  • aesthetics
  • geometries
  • facets
  • statistics
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point(size = .25) +
  facet_grid( ~ Sex) +
  stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) 

ggplot2: layers of grammatical elements

  • data
  • aesthetics
  • geometries
  • facets
  • statistics
  • coordinates
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point(size = .25) +
  facet_grid( ~ Sex) +
  stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) +
  coord_fixed(ratio = 2/25000)

ggplot2: layers of grammatical elements

  • data
  • aesthetics
  • geometries
  • facets
  • statistics
  • coordinates
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point(size = .25) +
  facet_grid( ~ Sex) +
  stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) +
  coord_trans(x = "log", y = "reverse")

ggplot2: layers of grammatical elements

  • data
  • aesthetics
  • geometries
  • facets
  • statistics
  • coordinates
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point(size = .25) +
  facet_grid( ~ Sex) +
  stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) +
  coord_flip()

ggplot2: layers of grammatical elements

  • data
  • aesthetics
  • geometries
  • facets
  • statistics
  • coordinates
  • theme

ggplot2: layers of grammatical elements

  • data
  • aesthetics
  • geometries
  • facets
  • statistics
  • coordinates
  • theme
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point(size = .25) +
  facet_grid( ~ Sex) +
  stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) +
  theme_dark() 

ggplot2: layers of grammatical elements

  • data
  • aesthetics
  • geometries
  • facets
  • statistics
  • coordinates
  • theme
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point(size = .25) +
  facet_grid( ~ Sex) +
  stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) +
  theme(strip.text = element_text(hjust = 0),
        panel.background = element_blank()) 

exercise 2

based on Davis (1990) and Fox and Weisberg (2011)

weight <- read_csv("../data/weight.csv") 
glimpse(weight)
Rows: 6,067
Columns: 8
$ subjectid         <dbl> 10027, 10032, 10033, 10092, 10093, 10115, 10117, 10…
$ gender            <chr> "Male", "Male", "Male", "Male", "Male", "Male", "Ma…
$ height            <dbl> 177.6, 170.2, 173.5, 165.5, 191.4, 172.0, 181.0, 18…
$ height_selfreport <dbl> 180.34, 172.72, 172.72, 167.64, 195.58, 175.26, 182…
$ weight            <dbl> 81.5, 72.6, 92.9, 79.4, 94.6, 80.2, 116.2, 95.4, 99…
$ weight_selfreport <dbl> 81.66969, 72.59528, 93.01270, 79.40109, 96.64247, 7…
$ age               <dbl> 41, 35, 42, 31, 21, 39, 32, 23, 36, 23, 32, 28, 36,…
$ race              <dbl> 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, …

exercise 2

grammatical elements in action

  • open script exercises/Exercise 2a.R
  • read and follow the instructions
  • replace the **___**s accordingly
  • run code (not the entire script): CTRL+Enter
  • bonus: exercises/Exercise 2b.R

aesthetics and attributes

e.g. colour, fill, size, shape, alpha

  • attributes take a property
  • aesthetics take variables: inside aes()
ggplot(horses, aes(y = Price, x = Age)) +
  geom_point(colour = "red")

aesthetics and attributes

ggplot(horses, aes(y = Price, x = Age, colour = Sex)) +
  geom_point() 

aesthetics and attributes

ggplot(horses, aes(y = Price, x = Age, shape = Sex)) +
  geom_point()

aesthetics and attributes

ggplot(horses, aes(y = Price, x = Age, colour = Sex, shape = Sex)) +
  geom_point()

aesthetics

typically x, y, colour, fill, size, alpha, linetype, labels

  • some are required by geometries; others are optional
  • continuous vs discrete variables:
    • e.g. shape and label can only be used for categorical values
  • ideally facilitate comprehension

all_aes$geom_point
[1] "x"      "y"      "shape"  "colour" "size"   "fill"   "alpha"  "stroke"
[9] "group" 
all_aes$geom_bar
[1] "x"        "y"        "colour"   "fill"     "size"     "linetype" "alpha"   
[8] "group"   
all_aes$geom_boxplot
 [1] "x"        "y"        "lower"    "xlower"   "upper"    "xupper"  
 [7] "middle"   "xmiddle"  "ymin"     "xmin"     "ymax"     "xmax"    
[13] "weight"   "colour"   "fill"     "size"     "alpha"    "shape"   
[19] "linetype" "group"   

decoding of continuous variables

(Wong 2010, 665)

  • position on a common scale
  • position on the same but nonaligned scales
  • lengths
  • angles, slopes
  • areas
  • volumne, monochromatic colour spectrum (saturation, grey scale)
  • pure spectrum colours

decoding of continuous variables

position on common scale

ggplot(data = horses, aes(x = Age, y = Price)) +
  geom_point(size = 3) +
  facet_grid(~Sex)

decoding of continuous variables

position on non aligned scale

ggplot(data = horses, aes(x = Age, y = Price)) +
  geom_point(size = 3) +
  facet_wrap(~Sex, scales = "free_y")

decoding of continuous variables

colour spectrum

ggplot(data = horses, aes(x = Age, y = Sex, colour = Price)) +
  geom_point(size = 3) 

decoding of continuous variables

area (size)

ggplot(data = horses, aes(x = Age, y = Sex, size = Price)) +
  geom_point() 

decoding of categorical variables (groups)

ggplot(data = horses, aes(x = Age, y = Price, 
                          colour = Sex)) +
  geom_point(size = 3) 

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

decoding of categorical variables (groups)

ggplot(data = horses, aes(x = Age, y = Price, 
                          label = Sex)) +
  geom_text(size = 3)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

decoding of categorical variables (groups)

ggplot(data = horses, aes(x = Age, y = Price,
                          shape = Sex)) +
  geom_point(size = 3)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

decoding of categorical variables (groups)

ggplot(data = horses, aes(x = Age, y = Price, 
                          colour = Sex)) +
  geom_smooth(method = "lm", se = F)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

decoding of categorical variables (groups)

ggplot(data = horses, aes(x = Age, y = Price, 
                          linetype = Sex)) +
  geom_smooth(method = "lm", se = F)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

decoding of categorical variables (groups)

ggplot(data = horses, aes(x = Age, y = Price, 
                          size = Sex)) +
  geom_smooth(method = "lm", se = F)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

exercise 3

practice aesthetics and attributes

  • open script exercises/Exercise 3a.R
  • read and follow the instructions
  • replace the ____s accordingly
  • continue with exercises/Exercise 3b.R
  • and exercises/Exercise 3c.R

major visualisation tools

major visualisation tools

~ 50 geometries

  • visual encoding of aesthetics layer
  • geom_…
 [1] "abline"         "area"           "bar"            "bin2d"         
 [5] "blank"          "boxplot"        "col"            "column"        
 [9] "contour"        "contour_filled" "count"          "crossbar"      
[13] "curve"          "density"        "density_2d"     "density2d"     
[17] "dotplot"        "errorbar"       "errorbarh"      "freqpoly"      
[21] "hex"            "histogram"      "hline"          "jitter"        
[25] "label"          "line"           "linerange"      "map"           
[29] "path"           "point"          "pointrange"     "polygon"       
[33] "qq"             "qq_line"        "quantile"       "raster"        
[37] "rect"           "ribbon"         "rug"            "segment"       
[41] "sf"             "sf_label"       "sf_text"        "smooth"        
[45] "spoke"          "step"           "text"           "tile"          
[49] "violin"         "vline"         

major visualisation tools

~ 50 geometries

  • other packages such as tidybayes and ggridges
  • many can be combined
  • depends on visualisation goal
  • and your subject domain
  • three groups: bivariate and univariate distributions, group comparisons

major visualisation tools

bivariate distribution

  • function: relationship between two variables
  • variable type: typically continous
  • examples: scatter plot, time series

major visualisation tools

scatter plot

major visualisation tools

scatter plot

major visualisation tools

time series

major visualisation tools

time series

major visualisation tools

univariate distribution

  • function: distribution of values
  • variable type: continous or discrete
  • examples: histograms, density plots, bar plots
ggplot(data = horses, aes(x = Sex)) +
  geom_bar()

major visualisation tools

univariate distribution

  • function: distribution of values
  • variable type: continous or discrete
  • examples: histograms, density plots, bar plots
ggplot(data = horses, aes(x = Price)) +
  geom_histogram()

major visualisation tools

univariate distribution

  • function: distribution of values
  • variable type: continous or discrete
  • examples: histograms, density plots, bar plots
ggplot(data = horses, aes(x = Price)) +
  geom_density()

major visualisation tools

group comparisons

  • function: distribution of values for two or more groups (often closely tied to statistical descriptions)
  • variable type: continous
  • examples: points / jitter, box plot, violin plot, barplot (pie chart), dynamite plots

major visualisation tools

dynamite plot and pitfalls thereof

  • suggest normal distribution?
  • same number of observations in each group?
  • bars suggest data where there are none?
  • are there no values above the errorbar?

major visualisation tools

dynamite plots

major visualisation tools

points

major visualisation tools

jittered points

major visualisation tools

jittered points and errorbars

major visualisation tools

box-and-whiskers plot

major visualisation tools

box-and-whiskers plot

major visualisation tools

box-and-whiskers plot (Tukey 1977)

exercise 4

major visualisation tools

  • open script exercises/Exercise 4a.R
  • read and follow the instructions
  • replace the **___**s accordingly
  • continue with exercises/Exercise 4b.R

cosmetics

changing text: labs

  • title
  • subtitle
  • caption
  • tag
  • x
  • y
  • colour, shape etc
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() + 
  labs()

changing text: labs

  • title
  • subtitle
  • caption
  • tag
  • x
  • y
  • colour, shape etc
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() + 
  labs(title = "My scatter plot")

changing text: labs

  • title
  • subtitle
  • caption
  • tag
  • x
  • y
  • colour, shape etc
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() + 
  labs(title = "My scatter plot", 
       subtitle = "I'm a subtitle")

changing text: labs

  • title
  • subtitle
  • caption
  • tag
  • x
  • y
  • colour, shape etc
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() + 
  labs(caption = "Caption for data source")

changing text: labs

  • title
  • subtitle
  • caption
  • tag
  • x
  • y
  • colour, shape etc
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() + 
  labs(tag = "A")

changing text: labs

  • title
  • subtitle
  • caption
  • tag
  • x
  • y
  • colour, shape etc
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() + 
  labs(x = "Age of horse", 
       y = "Price of horse in $")

changing text: labs

  • title
  • subtitle
  • caption
  • tag
  • x
  • y
  • colour, shape etc
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() + 
  labs(colour = "Legend\ntitle:")

changing text: legend (scale)

  • scale_colour_discrete
  • scale_colour_continous
  • scale_colour_manual
  • or any other aesthetic instead of colour
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() +  
  scale_colour_discrete(labels = c("female", "male"))

changing text: legend (scale)

  • change colour values manually
  • colour names: link
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() +  
  scale_colour_manual(labels = c("female", "male"),
        values = c("darkseagreen", "firebrick"))

changing text: legend (scale)

  • change colour values manually
  • colour names: link
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() +  
  scale_colour_manual(labels = c("female", "male"),
        values = c("darkseagreen3", "firebrick1"))

changing text: legend (scale)

  • change colour values manually
  • colour names: link
  mycolours = c("#000000", "#E69F00", "#56B4E9",
                "#009E73", "#F0E442", "#0072B2", 
                "#D55E00", "#CC79A7")

changing text: legend (scale)

  • change colour values manually
  • colour names: link
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() +  
  scale_colour_manual(labels = c("female", "male"),
                      values = mycolours[c(1,2)])

changing text: legend (scale)

  • change colour values manually
  • colour names: link
  • ggthemes
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() +  
  scale_colour_colorblind(labels = c("female", "male"))

changing text: strips

ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  facet_grid(~Sex)

changing text: strips

horses$Sex <- recode(horses$Sex, f = "female", m = "male")

changing text: strips

ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  facet_grid(~Sex)

changing text: strips

ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  facet_grid(~Sex, labeller = label_both)

themes

  • specify appearance of non-data related ink
[1] "theme_bw"       "theme_classic"  "theme_dark"     "theme_grey"    
[5] "theme_light"    "theme_linedraw" "theme_minimal"  "theme_void"    
  • e.g. ggthemes for more
  • set default: theme_set(theme_minimal())
  • adjust base font: e.g. base_size = 14

themes

ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + facet_grid(~Sex) +
  theme_grey(base_size = 14)

themes

ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + facet_grid(~Sex) +
  theme_minimal(base_size = 14)

themes

ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + facet_grid(~Sex) +
  theme_light(base_size = 14)

themes

  • axis
  • legend
  • panel
  • plot
  • strip
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  theme()

themes: axis

  • axis.text
    • axis.text.x
    • axis.text.y
  • axis.title
    • axis.title.x
    • axis.title.y
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  theme(axis.text = element_text(face = "bold"))

themes: axis

  • axis.text
    • axis.text.x
    • axis.text.y
  • axis.title
    • axis.title.x
    • axis.title.y
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  theme(axis.title = element_text(face = "bold"))

themes: axis

  • axis.text
    • axis.text.x
    • axis.text.y
  • axis.title
    • axis.title.x
    • axis.title.y
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  theme(axis.title.y = element_text(face = "bold"))

themes: legend

  • legend.background
  • legend.margin
  • legend.spacing
  • legend.key
  • legend.text
  • legend.title
  • legend.position
  • legend.orientation
  • legend.justification
  • legend.box
ggplot(data = horses, aes(y = Price, x = Age, 
                          colour = Sex)) +
  geom_point() + 
  theme()

themes: legend

  • legend.background
  • legend.margin
  • legend.spacing
  • legend.key
  • legend.text
  • legend.title
  • legend.position
  • legend.orientation
  • legend.justification
  • legend.box
ggplot(data = horses, aes(y = Price, x = Age, 
                          colour = Sex)) +
  geom_point() + 
  theme(legend.position = "top")

themes: legend

  • legend.background
  • legend.margin
  • legend.spacing
  • legend.key
  • legend.text
  • legend.title
  • legend.position
  • legend.orientation
  • legend.justification
  • legend.box
ggplot(data = horses, aes(y = Price, x = Age, 
                          colour = Sex)) +
  geom_point() + 
  theme(legend.position = "top", 
        legend.justification = "right")

themes: legend

  • legend.background
  • legend.margin
  • legend.spacing
  • legend.key
  • legend.text
  • legend.title
  • legend.position
  • legend.orientation
  • legend.justification
  • legend.box
ggplot(data = horses, aes(y = Price, x = Age, 
                          colour = Sex)) +
  geom_point() + 
  theme(legend.position = c(.9,.8))

themes: panel

  • panel.background
  • panel.border
  • panel.spacing
  • panel.grid
    • panel.grid.major
    • panel.grid.minor
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  theme()

themes: panel

  • panel.background
  • panel.border
  • panel.spacing
  • panel.grid
    • panel.grid.major
    • panel.grid.minor
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  theme(panel.background = element_blank())

themes: plot

  • plot.background
  • plot.title
  • plot.subtitle
  • plot.caption
  • plot.tag
  • plot.margin
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() +
  theme()

themes: plot

  • plot.background
  • plot.title
  • plot.subtitle
  • plot.caption
  • plot.tag
  • plot.margin
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  theme(plot.background = element_rect(fill = "pink"))

themes: plot

  • plot.background
  • plot.title
  • plot.subtitle
  • plot.caption
  • plot.tag
  • plot.margin
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  labs(title = "I'm a title") +
  theme(plot.title = element_text(colour = "pink"))

themes: plot

  • plot.background
  • plot.title
  • plot.subtitle
  • plot.caption
  • plot.tag
  • plot.margin
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  labs(caption = "I'm a caption") +
  theme(plot.caption = element_text(face = "italic"))

themes: plot

  • plot.background
  • plot.title
  • plot.subtitle
  • plot.caption
  • plot.tag
  • plot.margin
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  theme(plot.margin = unit(c(2,2,2,2), "cm"))

themes: facet strips

  • strip.background
  • strip.placement
  • strip.text
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  facet_grid(~Sex, labeller = label_both) +
  theme()

themes: strip.background

ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  facet_grid(~Sex, labeller = label_both) +
  theme(strip.background = element_blank())

themes: strip.background

ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  facet_grid(~Sex, labeller = label_both) +
  theme(strip.background = element_rect(fill = "forestgreen"))

themes: strip.text

ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  facet_grid(~Sex, labeller = label_both) +
  theme(strip.background = element_rect(fill = "forestgreen"),
        strip.text = element_text(colour = "white", hjust = 0))

themes: strip.text

ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  facet_grid(~Sex, labeller = label_both) +
  theme(strip.background = element_rect(fill = "forestgreen"),
        strip.text = element_text(colour = "white", hjust = 0, 
                                  face = "bold", size = 16, angle = 180))

saving your plot

  • ggsave(“name of plot.png”, width = 5, height = 5 )
  • types: pdf, png, tiff, jpg
  • sizes requires some manual adjustment
  • keep the aspect ratio sensible
  • or export function in plots panel

exercise 5

bringing everything together

  • Up-to-date COVID-19 data
  • open script exercises/Exercise 5a.R
  • read and follow the instructions
  • replace the **___**s accordingly
  • continue with exercises/Exercise 5b.R

closing remarks

useful resources

references

Allison, Truett, and Domenic V. Cicchetti. 1976. “Sleep in Mammals: Ecological and Constitutional Correlates.” Science 194 (4266). American Association for the Advancement of Science: 732–34.

Anscombe, Francis J. 1973. “Graphs in Statistical Analysis.” The American Statistician 27. Taylor & Francis Group: 17–21.

Davis, Caroline. 1990. “Body Image and Weight Preoccupation: A Comparison Between Exercising and Non-Exercising Women.” Appetite 15 (1). Elsevier: 13–21.

Fox, John, and Sanford Weisberg. 2011. An R Companion to Applied Regression. Vol. 2. Sage.

Hartwig, Frederick, and Brian E. Dearing. 1979. Exploratory Data Analysis. 16. Sage.

Matejka, Justin, and George Fitzmaurice. 2017. “Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics Through Simulated Annealing.” In Proceedings of the 2017 Chi Conference on Human Factors in Computing Systems, 1290–4.

Tufte, Edward R. 1983. The Visual Display of Information. Cheshire, Ct: Graphics Press.

———. 1989. The Visual Display of Quantitative Information. Vols. 13 – 14. Graphic Press.

Tukey, John W. 1977. Exploratory Data Analysis. Vol. 2.

Weisberg, S. 1985. Applied Linear Regression. Vol. 2. New York: John Wiley.

Wickham, Hadley. 2010. “A Layered Grammar of Graphics.” Journal of Computational and Graphical Statistics 19 (1). Taylor & Francis: 3–28.

———. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer.

Wilkinson, Leland. 1999. The Grammar of Graphics. Springer.

Wong, Bang. 2010. “Points of View: Design of Data Figures.” Nature Methods 7 (9). Nature Publishing Group: 665.